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Abstract —In this paper, the capacity of the additive white 
Gaussian noise (AWGN) channei, affected by time-varying 
Wiener phase noise is investigated. Tight upper and iower bounds 
on the capacity of this channei are deveioped. The upper bound 
is obtained by using the duaiity approach, and considering a 
specific distribution over the output of the channei. In order 
to lower-bound the capacity, first a family of capacity-achieving 
input distributions is found by solving a functional optimization 
of the channel mutual information. Then, lower bounds on the 
capacity are obtained by drawing samples from the proposed 
distributions through Monte-Carlo simulations. The proposed 
capacity-achieving input distributions are circularly symmetric, 
non-Gaussian, and the input amplitudes are correlated over time. 
The evaluated capacity bounds are tight for a wide range of 
signal-to-noise-ratio (SNR) values, and thus they can be used to 
quantify the capacity. Specifically, the bounds follow the well- 
known AWGN capacity curve at low SNR, while at high SNR, 
they coincide with the high-SNR capacity result available in the 
literature for the phase-noise channel. 

Index Terms —Phase noise, channel capacity, capacity achieving 
distribution, Wiener process. 

I. Introduction 

P HASE NOISE due to frequency instabilities of radio 
frequency oscillators is a limiting factor in high data rate 
digital communication systems (e.g., see [1]-[15] and refer¬ 
ences therein). Phase noise severely impacts the performance 
of systems that employ dense signal constellations [16], [17]. 
Moreover, the effect of phase noise is more pronounced in high 
carrier frequency systems, e.g.. E-band (60-80 GHz), mainly 
due to the high levels of phase noise in oscillators designed 
for such frequencies [18]-[22]. 

The Shannon capacity of the system can be studied in order 
to investigate the effect of phase noise on the throughput. 
Eor stationary phase-noise channels, Lapidoth [23] derived an 
asymptotic capacity expression, that is valid at high SNR. 
Capacity of the noncoherent channel, where the transmitted 
signal is affected by uniformly distributed phase noise, has 
been studied in [24]-[27]. In [27], Katz and Shamai derived 
upper and lower bounds on the capacity of the noncoherent 
phase-noise channel for non-asymptotic SNR regimes. They 
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showed that the capacity-achieving distribution of the nonco¬ 
herent memoryless channel is discrete with an infinite number 
of mass points. In [24], [28], the capacity bounds in [27] have 
been extended to the block memoryless phase-noise channel, 
where the phase noise was modeled as a constant over a 
number of consecutive symbols. It was shown in [24] that the 
capacity-achieving input distribution of the block memoryless 
phase-noise channel is not Gaussian (unlike the additive white 
Gaussian noise channel). In [25], the constrained capacity 
of M-ary phase-shift keying over a noncoherent phase-noise 
channel has been investigated. Capacity of partially coherent 
channels, where the phase noise is estimated at the receiver, 
and the signal is affected by the residual phase noise estimation 
errors, has been studied in [29]. The achievable information 
rate of phase-noise channel and methods for the calculation 
of that have been discussed in, e.g., [30]-[32]. Achievable 
information rate of multi-carrier radio links, affected by phase 
noise, has been analyzed in [33]. Effects of using multisam¬ 
pling receivers on the achievable information rate of the phase- 
noise channel has been recently investigated in [34]-[36]. 

There has been limited number of studies on characteriz¬ 
ing the capacity of channels affected by phase noise with 
memory (e.g., [23], [34], [37], [38]). The Wiener phase- 
noise channel that models many practical scenarios belongs 
to this family of channels. In [23], Lapidoth characterized the 
capacity of the Wiener phase-noise channel at high SNR. It 
was shown in [23] that circularly symmetric input alphabets 
with Gamma-distributed amplitudes can achieve the capac¬ 
ity of the stationary phase-noise channel (with or without 
memory) at high SNR. Capacity results of [23] have been 
recently extended to multi-antenna systems in [37], [39], 
[40]. However, the capacity-achieving input distribution of the 
Wiener phase-noise channel and the closed-form capacity of 
this channel, valid for all SNR values, have not been derived 
yet. 

Contributions 

In this paper, we derive tight upper and lower bounds on 
the capacity of the additive white Gaussian noise (AWGN) 
channel affected by Wiener phase noise when the channel 
input is subject to an average-power constraint. The upper 
bound on the capacity is found by using the duality approach, 
and considering a specific distribution over the output of the 
channel. We determine a family of input distributions that 
result in a tight lower bound on the capacity. We show that the 
capacity-achieving input distribution is circularly symmetric 
but non-Gaussian. We also show that unlike for memoryless 
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channels, the capacity-achieving input alphabets are correlated 
over time. Lower bounds on the capacity are obtained by 
numerical calculation of the information rates, achievable 
by samples generated from the proposed input distributions 
through Monte-Carlo simulations. The developed upper and 
lower bounds are tight for a wide range of SNR values. This 
helps to more accurately quantify the capacity of the phase- 
noise channel compared to previously available results in the 
literature (e.g., [23]). 

Organization of the Paper 

The paper is organized as follows. In Section II, the system 
model and the corresponding amplitude-phase channel are 
introduced. Using the amplitude-phase channel, the mutual 
information between the input and the output is examined in 
Section III. In Section IV, a capacity upper bound is derived. In 
Section V, we obtain the closed-form expression for a family 
of capacity-achieving distributions. Finally, in Section VI, the 
proposed lower and upper bounds are compared against each 
other and also the results available in the literature. 

Notation 

With A/’(0, cr^) and CA/’(0,(T^), we denote the probability 
distribution of a real Gaussian random variable, and of a 
circularly symmetric complex Gaussian random variable with 
zero mean and variance . The uniform distribution over the 
interval [0, 27r) is denoted as tf(0, 27r). We use | ■ | to denote 
the absolute value of scalars, and determinant of matrices. The 
Euclidean norm of vectors is denoted by || • ||. Finally, T>(-||-) 
denotes the relative entropy between two probability distribu¬ 
tions. For notational convenience, f{x) and fijj) refer to two 
different probability distribution functions, fx{x) and fy{x), 
respectively. 

II. System Model 

A. The channel 

The input-output relation of the discrete-time Wiener phase- 
noise channel can be written as [4] 

yk = Xke^^>‘+Wk, ( 1 ) 

where Xk is the transmitted symbol, and Wk is circular sym¬ 
metric AWGN independently distributed from CAf{0,2a^). 
The process, fk, is the discrete-time Wiener phase noise 

fk = fk-i + Afc ~ A/'(0,cri). (2) 

This discrete-time process corresponds to a sampled version of 
a continuous-time Brownian motion process with uncorrelated 
increments.* Samples are taken every Ts seconds, the trans¬ 
mission symbol interval.^ The continuous time process of the 

*For discussions on the limitations of the Wiener phase noise model see 

m. [11], [41]. 

^Note that the system model (1) is derived under the assumption that the 
continuous-time phase-noise process remains constant over the duration of the 
symbol time. This assumption allows us to obtain a discrete-time equivalent 
channel model by sampling at Nyquist rate. As shown recently in [34]-[36], 
by dropping this assumption one may obtain different high-SNR capacity 
characterization. 


corresponding oscillator has a Lorentzian spectrum [7], [42]. 
This spectrum is fully characterized by a single parameter; the 
3dB single-sided bandwidth, fsds, which depends on central 
frequency and design technology of the oscillator [22]. The 
innovation variance for the discrete-time phase-noise process 
is 0 -^ = 47r/3dBTs.** 


B. Amplitude and phase input-output relations 

The input Xk, to the channel (1) and the output yk are 
complex numbers, and can be represented in polar form as 
Xk = and yk = rkc’^’^. In this notation, Rk and 0^ 

denote the amplitude and the phase of the transmitted symbol 
Xk, while rk and Ok denote the amplitude and the phase of 
the received sample yk, respectively. The input-output relations 
between the transmitted and received amplitude and phase are 

Tfc = \J {Rk + Wk^wY + (3) 

&k = 0fe + Nk + fk, (4) 

where 

Nk = arctan ( ^ , (5) 

\Rk +Wk,\\J 

and Wfe II and Wk,± denote the parts of Wk that are parallel (in- 
phase) and orthogonal to the transmitted signal, respectively. In 
the rest of the paper, we investigate the capacity of the phase- 
noise channel (1) by considering the equivalent amplitude and 
phase channels stated in (3) and (4). 


C. Definition of Capacity 

The capacity of the phase-noise channel (1) is given by [43] 

C'(SNR) = lim sup —/(x;y) (6) 

n^oo^(x) n 

= lim sup —/(r, 0;R, ©), (7) 

where x = {xk}l^-^, y = {yk}l=i, r = {rk}l^i, 0 = 
{(^k}k=i, R = {Rk}k=v ® = {0fc}fe=i- The supremum 
in (6) and (7) is computed over all probability distributions on 
the input that satisfies 

- n 1 

-J2^[\xkn=-J2^[Rl]<E,, ( 8 ) 

k^l 

where is the maximum average power. The SNR is defined 
as SNR := E^/2a^ throughout the paper. 

^Note that the innovation variance can equivalently be found directly from 
the spectmm of the phase-noise process [22]. 
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III. Mutual Information 


In (16), we choose D to be the difference matrix defined as 


The mutual information on the right-hand side (RHS) of (7) 
is written as 

-/(r,0;R,©) (9) 

n 

= -(/i(r,e)-fi(r,e|R,©)) (10) 

n 

= - (fi(r) -f h{9\r) - fi(r|R, ©) - h{9\r, R, ©)) (11) 
n 

= - (h(r) + h(9\r) - fi(r|R) - h(9\r, R, ©)) (12) 

n 

= - (/(r; R) + h{9\r) - fi(© + N + 4>\r, ©, R)) (13) 
n 

= - (/(r; R) + h{9\r) - h{N + </)|r, R)), (14) 

n 

where cj) = {4>k}k=i, and N = In (11) the chain 

rule for entropy is used, and (12) follows because r and © are 
independent (see (3)). In (13) and (14), we used the definition 
of Ok, given in (4). 

We present the following lemma pertaining to the capacity- 
achieving input distribution, which will be used throughout the 
paper. 


-1-1 0 ... 0 ■ 

0 1 -1 ■■■ : 


0 


0 


-1 

1 


( 21 ) 


where \D\ = 1. The equality in (16) follows from [43, 
Eq. 8.71], and in (17), we used that log2(|79|) = 0. The noise 
vector N 4- 0 is rearranged in (16) as the difference between 
the consecutive noise samples in order to resolve the infinite 
memory of the phase-noise process (f)k. By substituting (19) 
in (15), we obtain 

-I{r,9;R,@) = - (/(r;R) - ({afcjfc^Jr, R)) -f log 2 27r. 
n n 

( 22 ) 


In order to find upper and lower bounds on the capacity, we 
need to evaluate the two first terms on the RHS of (22). 


Lemma 1: The capacity-achieving input of the channel (1) 
is circularly symmetric, i.e., {Qk}k=i independently and 
identically distributed from 14{0,2tt) and are independent of 

R. 

Proof: The proof directly follows that of [44, Prop. 7], where 
it is shown that the capacity-achieving input of the fading 
channel with memory is circularly symmetric. ■ 

Based on (4) and the result of Lemma 1, it can be de¬ 
duced that the output phase is also uniformly distributed and, 
hence, h{9\r) = nlog 2 27r. Thus, (14) can be rewritten as 

-I{r, 9- R, ©) = - (/(r; R) - fi(N + (/)|r, R)) + log 2 27r. 
n n 

(15) 

Next, we use the definition of the phase channel (4), and 
rewrite the second term on the RHS of (15) as 


IV. Capacity Upper Bound 

In this section, an upper bound on the capacity of the phase- 
noise channel (1) is derived. We first find a lower bound for 
the entropy term on the RHS of (22) as follows 


n 


^/z(afc|afc_i,... ,ai,r, R) 

(23) 

k^l 

n 

'^h{ak\ak-i: .. . ,ai, 

(24) 

k^l 

n 

{ak\Nk-u r,R) 

(25) 

k^l 

n 

h {Nk + Ak\rk,Rk) 

(26) 

k^l 

Tlh {N “h Afi \ Tn^ Rn ) ■ 

(27) 


/i(N-f R) 

= ^)|r,R) - log 2 |£>| (16) 

= h{Nn — Nn-I + An, . . . , 

N2-Ni+A2,Ni + Ai\r,Il) (17) 

= hi{Nk - Nk-i + Ak}^^ 2 ,Ni + Ai|r, R) (18) 

= fi({afcK=i|r,R) (19) 


where 


ak 


A 


Ni + Ai 

Nk — Nk-i + Ak 


if /c = 1 
if /c > 1 


( 20 ) 


Here, in (23), the chain rule of differential entropy is used. 
In (24), we conditioned on the complete knowledge of the 
noise sample Nk-i, and we used that conditioning reduces 
entropy. Equality in (25) holds because ak is conditionally 
independent of {ok-i, ■ ■ ■, ai) given Nk-i (see (20)). Einally, 
in (27), we assumed that Rk, and therefore r^, and Nk + 
are stationary. As the channel is stationarity, one may show 
that there is no gain in choosing a non-stationary input. 

We next upper-bound the mutual information on the RHS 
of (22) by using the duality approach [45, Th. 5.1]. Let /(r|R) 
denote the conditional probability of r given R, and /(r) 
denote the distribution of r for a given input distribution /(R), 
and lastly, let g(r) be an arbitrary distribution of r. The mutual 
information in (22) can be upper-bounded by using duality 
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TABLE I: Numerically calculated values of a\j{^ = 0) and 
/?u(m = 0), when = 1, and for various cr^ and a\. 


as [45, Th. 5.1] 


/(r;R)=E^(R) [P(/(r|R)||/(r))] (28) 

= E;(r) [l?(/(r|R)||g(r))] - R(/(r)||Q(r)) (29) 

<E;(r) [l2(/(r|R)||g(r))] (30) 

= -E/(r) [ln(g(r))] -/r(r|R). (31) 


Here, T’(-||-) denotes the relative entropy between two prob¬ 
ability distributions [43, Eq. 8.46]; (28) follows the definition 
of the mutual information [43, Eq. 8.49]; in (29) we used 
Topspe’s identity [46]; (30) follows because of the nonneg¬ 
ativity of relative entropy [43, Thm. 8.6.1]. Einally, (31) 
follows diretly from the definition of the relative entropy [43, 
Eq. 8.46]. 

Erom (30), we see that any choice of the auxiliary output 
distribution (/(r) results in an upper bound for /(r; R). How¬ 
ever, qijc) needs to be selected such that a tight upper bound 
is obtained. Specifically, we choose the output amplitudes to 
be independently distributed from q{r), which is a particular 
mixture of a half-normal distribution and a Rayleigh distribu¬ 
tion 


q{r) = 


au(M) 


= -/3u(m)’' 


r > 0. 


(32) 




We will soon motivate the form of q{r). Moreover, in Sec¬ 
tion VI, we will show that this choice of q{r) results in a 
tight upper bound on the capacity. 

In (32), /i > 0 is a constant that will be optimized later to 
tighten the upper bound. Eor any fi, the parameters a\j {jj) and 
should be chosen such that certain constraints on q{r) 
are satisfied. The first constraint is based on the fact that q{r) 
is a probability distribution function and, hence, must integrate 
to one 

q{r)Ar = 1. (33) 

The second constraint is due to the input power constraint (8), 
and can be found from (3) 



r^q{r)Ar = + 20-^. (34) 

Although finding closed-form expressions of q:u(/ 4) and 
/?u{m) is not straightforward, it is possible to determine their 
values numerically. The numerical method that we used for 
computing these parameters is presented in Appendix I, and 
Tab. I contains their computed values for /r = 0 and for various 
values of and cr^. 

The transition between the half-normal, and the Rayleigh 
distributions in (32) is based on the values of a% and a\. 
At high SNR, where the phase noise dominates (cr^ << 
(T^), q{r) is asymptotically half-normal, while it is a Rayleigh 
distribution for the low SNR values. 

This choice of auxiliary output distribution in (32) is 
motivated as follows; i) The capacity-achieving distribution 
of the Gaussian channel is a normal distribution [43], and 
thus the input (and also the output) amplitude follows a 
Rayleigh distribution, ii) As shown in [23], a tight upper 
bound for the phase-noise channel at high SNR can be found 
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CTw 

«u(/2 = 0) 

^u(m = 0) 

= 10-2 [rad2] 

5 X 10-2 

0.43 

0.88 

5 X 10-3 

0.17 

0.73 

5 X 10-^ 

0.10 

0.59 

5 X 10-s 

0.09 

0.53 

5 X 10-3 

0.08 

0.51 

o-^ = 10-3 [rad2] 

5 X 10-2 

0.43 

0.94 

5 X 10-3 

0.14 

0.92 

5 X 10-4 

0.05 

0.73 

5 X 10-5 

0.03 

0.59 

5 X 10-3 

0.03 

0.53 


by using the duality approach, and considering an optimized 
Gamma distribution as an auxiliary distribution on |yp. In 
that case, by following the standard technique for determining 
the probability density function of a transformed random 
variable [47, Ch. 5], it is straightforward to show that r = \y\ 
follows a half-normal distribution. 

By substituting (32) in (31), we obtain 


I (r; R) < —nE [log 2 g(r)] — nh{r\R) 

/3u(/4) 


(35) 


= -n\-Og2{a\]{y)) +n' 


ln(2) 


E^] 


-E 

2 


l0g2 




{r + y) 


2+^1) -nh{r\R), 


(36) 


where in (35), we used that r is independently and identically 
distributed (iid). Erom (34), we have E [r^] = E^E 2a%. By 
substituting (27) and (36) in (22), then (22) in (6), we obtain 
an upper bound on the capacity as 


C(SNR)<-l„g,(54L)+^(£. + 24) 


+ sup i Je 

/(fl) I ^ 


log2 






+ O'a 


-h{r\R)-h{N+ A\r,R)\, (37) 


1„„ ( , o_2 ^ 


sup E/(r) 

/(fl) I 


-E 


'/(u)|| ,wE) 


l0g2 




(r+ /r) 


2 

2 ' 0-A 


-h{r\R) -h{N + /\\r,R) 


>. (38) 


Einally, the capacity of the phase-noise channel (1) can be 
bounded as C'(SNR) < Cu(SNR), where 

a(SNR) = { - log, (2|^) + + 2o|) 

+ max G{R)}, (39) 
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and 


g{R) = 




l0g2 


^ {R + w\\)^ + + fi'j 

- h ^y(i? +W||)2 + 


2 + cr^ 


-hi. 




arctan ■ 


W L 


R + It’ll 


(40) 


In (39), the expectation over f{R) is upper-bounded by the 
maximum value of the expression, which expectation is taken 
over, for a given R. Finally, the bound can be tightened by 
minimizing over p > 0. 

The upper bound in (37) is further simplified and the final 
result is presented in the following proposition. 


Proposition 2: Capacity of the Wiener phase-noise channel (1) 
is upper-bounded as 


C'(SNR) < Cu(SNR)+ o(l), SNR ^ oo (41) 

where o(l) denotes a function that vanishes as SNR grows 
large, and 


Cp(SNR) = 

^'^1^(2) ~ \ = 0)- (42) 

Proof. Please refer to Appendix II. ■ 


As we shall see from simulation results in Section VI, the 
provided upper bound is tight for a wide range of SNR values. 


V. Capacity Achieving Distributions and the 
Capacity Lower Bound 

One approach to find the capacity lower bound is to restrict 
the input to have a particular distribution. However, the input 
distribution must be chosen such that a tight lower bound is 
obtained. In this section, we present a method to intelligently 
choose the distribution of input amplitudes, /(R). 

We first reconsider the amplitude and phase channel models 
in (3) and (4), which at high SNR, reduce to 


^J{Rk+Wk,\\)^ +wl^j_ 

(43k 


(44) 

Rk + Wfe,|| 

(45) 

0fe -b arctan ^ -h fk 

Rk + Wk,\\ 

(46) 

+ Nk -b 4>k, 

(47) 


where Nk = Wk.±/rk. In (47), we used that arctan(z) fv z 
for small z. In the following, we study the channel defined in 
(45) and (47), and derive the capacity-achieving distribution 
for this simplified channel. Note that in this section, r^, 6^, 
and Nk refer to the parameters of the approximate channel to 
avoid defining new variables. 


By using the approximate input-output amplitude and phase 
relations in (45) and (47), and by following similar steps that 
lead to (22), we obtain 


i/(r,0;R,©) 

n 

= -(-^(i’;R) - ft.({afc}fe^i|r,R)) -f log2 27r (48) 

n 

= - ((i(r) - h{r\R) - h {{ak}k=i If)) + log 2 27r (49) 

n 

= ^((i(r) -h(yv\\) -h ({afcjfc^i|r)) -f log 2 27r (50) 

= - h{{ak}k^i\r)) 

- 2 27recrw + log2 27r, (51) 

where W|| = {wk^\\}k=i- In (49), the first term of (48) is 
rewritten by using the definition of mutual information. For the 
second term, we used that {ak}k^i, given r, is independent 
of R because of (47). In (51), we used (45) and that of 
the entropy of the n-dimensional Gaussian-distributed random 
variable W|| is given by [43, Thm. 8.4.1] 

H^\\) = ^ log2 27recrw- (52) 

Any choice of /(R) results in a lower bound on the 
capacity of the approximate amplitude-phase channel. Note 
that the infinite memory of the Wiener phase-noise process, 
(pk, is resolved by rearranging the noise vector N -f </> into 
the difference between the consecutive noise samples. This 
results in dependency among samples (see (16)- 

(19)), and motivates to consider an input distribution that 
introduces a limited order dependency across the amplitudes of 
the consecutive symbols. More specifically, we consider block- 
independent input amplitudes and confine the optimization 
in (7) to the set of input distributions of the form 

/(R) = n f (R^'^) = (/(R))”^"^ , (53) 

- (fc) 

where R are blocks of length-M > 1 samples obtained by 
dividing the vector of input amplitudes 


= {Rl, . . . , Rm}, {Rm-^-1, ■ ■ ■ , R2m}, ■ ■ ■ , {Rn-M-^-l, ■ ■ ■ , Rn} ■ 


(54) 


- (k) - 

The second equality in (53) follows as R are iid from /(R). 
By substituting (51) in (7), and limiting the input distributions 
to (53), we obtain 

C'(SNR) > lim sup |-(/i(r) -({a/c}fc^i|r)) 1 

1 9 

+ log2 27r - - log2 27recrvv (55) 


'^Length M = 1 is an uninteresting case because an depends at least on 
two consecutive symbols. 
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According to the input-output relation (45), block-independent 
input symbols result in block-independent output samples 
denoted by 

f = {»'l) ■ • ■ 1 ^m} , {rM+l, ■ ■ ■ , r 2 M} , ■ ■ ■ 1 {rn-M+h ■ ■ ■ , • 

Ap(l) Ap(2) Ap(n/M) 

(56) 

Thus, a lower bound on the capacity can be found by us¬ 
ing (55) and (56) as 


C(SNR) 


> 


l^supj^ (/i - h {{ak}l=i\r)'j 


/(R) 


- 2 I 0 S 2 27reCTw + loga 27r 


= sup<{ - ^h{{ak}l=i\v) 


/(R.) 


- 2 I 0 S 2 27reCTw + log2 27r 


/(R) I M "^00 n J 


- 2 log2 27reCTw + log2 27r 


/(R) f ” J 




- 2 log2 27reCTw + log2 27r 


/(R) 




- 2 log2 27reCTw + log2 27r. 


(57) 


(58) 


(59) 


(60) 


( 61 ) 


Here, in (58), we used that are iid and stationary, and then 
rewrote the first term inside the supremum. In (59) we swapped 
the supremum and the limit operations, which resulted in 
the inequality. Appendix III describes the steps involved in 
the swapping. In (60), the joint entropy is replaced with 
an equivalent expression for the differential-entropy rate [43, 
Th. 4.2.1]. The inequality in (61) holds because removing 
conditioning from the second term increases entropy. By using 
the chain rule of entropy on the second term of (61), we obtain 

C(SNR)> sup(^Mf) 

/(R) 

+ ^({afc}fc-^_M+2|f) 

— h {am {ak}k=n-M+2\^. 

- ^ log2 27recrw + log2 27r, 

where the superscript of r, and the limit are removed due 
to stationarity. According to (47), the second and the third 
terms on the RHS of (62) correspond to the entropy of two 



zero-mean Gaussian random vectors with covariance matrices 
defined as 


S„_i =cov({afe}”^^_J^^_^2|f) (63a) 

I]„ = cov({afc}^^„_M+2|f)- (63b) 


By using (63) and the definition of the entropy for a Gaussian 
random variable [43, Thm. 8.4.1], (62) can be rewritten as 


C(SNR) > sup<^ 

/(R) ( 

- 


-log2 27re 5r(f) 


+ log2 27r - - log2 27recrw, 


(64) 


where gr{r) = |S„|/|I]„_i|. 

In order to find a tight lower bound on the capacity, we need 
to find the supremum in (64) by searching over all probability 
distributions on R that satisfy the power constraint (8). Un¬ 
fortunately, this optimization is not mathematically tractable 
in general. However, the search space reduces at high SNR, 
since /(rj converges to /(R). Here, determining the opti¬ 
mized /(R) becomes equivalent to finding the optimized /(r). 
Therefore, we maximize (64) over /(r) by means of functional 
optimization. The optimized /(r) is used as /(R), which 
is used to evaluate the lower bound for the capacity of the 
channel in (1). The optimization steps needed are described in 
Appendix IV, and finally, /(R) is found as 

/(R) = (65) 

where the parameters and (3^^^ are chosen such that the 
following constraints are satisfied. The first constraint is based 
on the fact that /(R) is a probability distribution function and, 
hence, must integrate to one 

/(R)dR=l. (66) 

The second constraint is due to the input power constraint (8) 

R|p/(R)dR = MT;s. (67) 

To numerically compute and j3\^\ a method similar 

to that presented in Appendix I is used. Tab. II contains the 
calculated values of these parameters in various scenarios. 
Note that the superscript (M) shows the dependency of these 
parameters on the dimension of R. 

In the numerical-result section, we will see that choosing 
the input distribution as proposed in (65) results in a tight 
lower bound on the capacity for a wide range of SNR values 
and phase noise innovation variances. 

In the following, we evaluate /(R) for M = 2 and M = 3, 
which will be used in our numerical simulations. For M = 2, 
the argument of the limit in (61) is equal to h (an|f). Hence, 




Pr(f) = |S„| = cov(a„|r) 


_ I “W I 

- + - 2 - + 


' n-1 


( 68 ) 
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TABLE II: Numerically calculated values of at and /3 l, 
for Es = 1, = 10“^ and various 


4 

“l 


(3) 

“i. 


5 X 10-2 

0.509 

0.997 

0.12500 

0.991 

5 X 10-3 

0.051 

0.967 

0.00400 

0.936 

5 X 10-4 

0.006 

0.825 

0.00020 

0.756 

5 X 10-3 

0.001 

0.634 

0.00003 

0.598 

5 X 10“® 

0.001 

0.544 

0.00002 

0.533 


By using (65) and (68), we obtain the two-dimensional input 
distribution as 


l) tr 


( 2 ) 




(69) 




For M = 3, the covariance matrices in (63) can be 
computed as 


En-i = cov(a„_i|r) = cri + -f 

ri-i 

Yj-n = cov(u^, |r) 


4 


>A + ^ + 


4 + ^ 


(70) 

(71) 

(72) 


By using (65) we obtain a three-dimensional input distribution 
= ^--^^75-, (73) 


^'7r(R 


where 


5r(R) — 4 + 


4 , 4 


Rt 


Rl EL 


(74) 


n-l a 


A 


VI. Numerical Results 

In this section, we first evaluate the proposed upper 
bounds (39) and (41). Then, we use the derived input dis¬ 
tribution (65) to find lower bounds on the capacity through 
numerical simulations. We compare the bounds with the 
capacity of the AWGN channel [43, Ch. 9] 

Cawgn(SNR) = log2 ^1 + 1 (75) 

and the asymptotic high-SNR capacity of the Wiener phase- 
noise channel derived by Lapidoth in [23], 

(76) 

Figs. 1 and 2 illustrate the upper bounds (39) and (42), and 
simulated lower bounds for different SNR values. In Fig. 1, 
the phase noise innovation variance is a\ = 10“^, while it is 
a\ = 10-2 in Fig. 2. 

The upper bound (70(5NR) in (39) is calculated by means 
of Monte-Carlo simulations. More specifically, we compute 
Q {R) in (40) for given values of R and p, by drawing samples 



Fig. 1: The proposed capacity bounds vs. the capacity of the 
AWGN channel and the asymptotic high-SNR capacity of the 
phase-noise channel from [23]. Here, 4 = 10“^ [rad^j. 



Fig. 2: The proposed capacity bounds vs. the capacity of the 
AWGN channel and the asymptotic high-SNR capacity of the 
phase-noise channel from [23]. Here, 4 = IQ-^ [rad^j. 


from W||, w±, and A. The first term of GiR) is a double 
integral, computed numerically. The differential entropy terms 
of G{R) are estimated by using the nearest neighbor estima¬ 
tor [48]. 

To evaluate the upper bound in (41), we omit the o(l) 
term and plot (7u(SNR) from (42). Figs. 1 and 2 show 
that the this asymptotic upper bound expression (7u(SNR) 
matches (7u(SNR) for SNR values around 10 dB and above. 

It can be seen from Figs. 1 and 2 that the upper 
bound (7u(SNR) is not tight for SNRs below 10 dB (because 
it exceeds the AWGN capacity). An alternative upper bound, 
tighter than (7u(SNR), is min{(7u(SNR), (7awgn(SNR)}. 
We observe from the figures that the asymptotic upper 
bound (Up(SNR) is a very accurate approximation of this 
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alternative bound. 

In general, at low SNR, the capacity upper bound ap¬ 
proaches the AWGN capacity (75) because the AWGN dom¬ 
inates over the phase noise. At high SNR, where the phase 
noise dominates, the derived upper bounds follow the high- 
SNR capacity of the Wiener phase-noise channel (76). It can 
be seen from Figs. 1 and 2 that phase noise starts to dominate 
for SNR values larger than approximately 101og(l/cr^) dB. 
The exact value of this point can also be analytically found 
by intersecting (75) and (76). 

In order to numerically find lower bounds on the capac¬ 
ity of the channel (1), the sum-product algorithm proposed 
in [49] for calculation of the information rate of channels 
with memory is used. We specifically use the particle-based 
implementation of this method, which is introduced in [50]. 
First, we use the rejection sampling approach [51] to draw 
samples from (65) for the input amplitudes. For the phase of 
the input samples, we use Lemma 1, and draw independent 
samples from W(0,27r). The generated input samples are 
transmitted over the original channel (1), and the achievable 
information rate is computed as explained in [50]. In figs. 1 
and 2, the simulated lower bounds with M = 2 and M = 3 
are compared against the proposed upper bounds. The particle- 
based method of [50] with 10^ particles over 10^ channel uses 
is employed. It can be seen that the computed lower bounds 
are close to the upper bound for a wide range of SNR values. 
In particular, the input with a higher order of dependency of 
amplitudes (M = 3) results in a tighter lower bound for lower 
SNRs. 

Fig. 3 shows Cp(SNR) for different values of phase noise 
innovation variance, cr^. As expected, the bound approaches 
the capacity of the AWGN channel without phase noise when 
cr\« 



Fig. 3: The proposed capacity upper bound for various 
compared against the AWGN capacity. 


we obtain 


ir r^<l(r)dr 


— ft's + 2cr^, 


(77) 


where the left hand side of (77) is independent of au{iJ,) 
based on (32). To find /3u(/^) that satisfies (77), we use 
the bisection method. In each iteration of the algorithm, the 
integrals involved in (77) are numerically calculated. After 
finding /3 u(m)j (33) is used to compute 


Appendix II 

Proof of Proposition 2 


VII. Conclusion 

In this paper, we presented methods to develop tight upper 
and lower bounds on the capacity of the Wiener phase- 
noise channel. A capacity upper bound, tighter than that 
of available in the literature, is derived. We also derived 
analytical expressions for a family of input distributions, which 
result in tight lower bounds on the capacity. The proposed 
input distributions are circularly symmetric and non-Gaussian. 
Moreover, the input amplitudes are correlated over time. The 
proposed upper and lower bounds tightly enclose the channel 
capacity for a wide range of SNR values. The proposed bounds 
reach the AWGN capacity at low SNR. In the limiting regime 
of high SNR, only the amplitude of the transmitted signal can 
be perfectly recovered, whereas the phase is lost. Therefore in 
that regime, by increasing the SNR gains in capacity can only 
be achieved through the amplitude channel. 

Appendix I 

Numerical Calculation of au{fi) and /3u(/^) 

In this section, we present the numerical method that is used 
for the calculation of and /3 u(m)- From (33) and (34), 


To further simplify the upper bound in (39), we use the 
escape-to-infinity property of the capacity-achieving input 
distribution [45, Def. 4.11]. We impose the additional con¬ 
straint R > Rq OTi the input distribution, where i?o > 0, 
and denote the capacity of the channel (1) as C'^“(SNR). By 
following the same procedure as in [45, Th. 4.12] and [52, 
Th. 8], we can show that 

C(SNR) =C'^«(SNR)-f o(l), SNR^cx), (78) 

where o(l) denotes a function that vanishes as SNR grows 
large. This means that the high-SNR behavior of C'(SNR) does 
not change if the input distribution is constrained to lie outside 
a sphere of an arbitrary radius. Considering this new constraint 
and repeating the steps leading to (39), we obtain 

C«"(SNR) < mk { - logj 

+ maxS(K)}, (79) 

where 0(R) is defined in (40). By choosing Rq to be arbitrary 
large, we can evaluate 0(R) when R ^ oo. The first term is 



























































found as 


Appendix III 

Swapping Supremum and Limit Operations 
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lim 

R—>(Xi 


-E 




l0g2 




^ [\/(R + w\\)^ + wl + 


0-A 


= 2 (c^a) ■ (80) 


The equality in (80) follows from the dominated convergence 
theorem that permits interchange of the limit and the expec¬ 
tation operators. ^ 

For the second term of G{R) we obtain 


^1 ^h {R + w\i)^ + wj_j 

= ^lto k(R + »,+o(i)) ( 81 ) 

= lim h(R + w\\) (82) 

i?^oo 

= ^log2 27recr^. (83) 

In (81), we have written the power series of the function inside 
the differential entropy about R = oo. The 0(l/i?) term 
represents omitted terms of order 1/R. In (82), we used [45, 
Lemma. 6.9] 


lim h(a + eb) = h(a). (84) 

£->■0 

Finally, in (83), we used that translation does not change 
differential entropy. Then, we used the entropy of a Gaussian 
distributed random variable [43, p. 244]. 

Similarly, for the third term of G{R), we obtain 

lim h (arctan - hAlr-) =h(A) (85) 

J?—»-oo y i? + tUii J 

= ^log2 27recr^. (86) 

Here in (86), we again used (84), and the fact that 

w± 

lim arctan-= 0. (87) 

R—^oD R u;|| 

Substituting (80), (83), and (86) in (79), we obtain 

C»-(SNR)<mi„{^(£. + 2,J,) 

- icr^e2log2a^(p)|. (88) 

From (32) we observe that q{r) becomes asymptotically 
independent of p when R ^ oo, and thus become and 

PuilJ')- Hence in this case, and /3u(ft) can be found for 

any arbitrary ji, and the minimization in (88) can be omitted. 
Without loss of generality, we set p = 0, and substitute (88) 
in (78) 

C(SNR) < ^\ E,, + 2 al) 

- ^log2f^we^au(F = 0) + o(l). (89) 

This concluded the proof of Proposition 2. 


^It is straightforward to show that the dominated convergence theo¬ 
rem holds by choosing an integrable dominated (majorizer) function such 
as log 2 + cr^) over the function inside the expectation. 


In this section, we describe the steps involved in swapping 
the supremum and the limit operations in (59). We first define 

G(/(R),n) 4 - -M{a4Li|r) ■ (90) 

M n 

For each fixed /*(R), supG(/(R),n) > G(/*(R),n). 
/(R.) 

If lim„_).oo G'(/(ft),n) exists, we obtain 


lim supG(/(R), n) > 


lim G(r(R),n). 

n—)-co 


(91) 


Since (91) holds for any /*(R), the supremum also satishes 
the inequality, thus 


lim supG(/(R),n) > sup 
”^°°/(R) /(R) 


lim G(/(R),n). 

n—^oo 


(92) 


Appendix IV 

Optimized Input Distribution 

A. Background on Functional Optimization 

Dehnition of the capacity stated in (6) is a functional 
optimization problem [53]. A functional is a mapping from 
a function to a real number, e.g. an integral. A background on 
calculus of variations [53] is presented here that later will be 
used in our analysis. 

Consider the functional F{u), dehned as 

F{u) = I Ar(x, m(x)) dx, (93) 

JQ 

where u(x) is a real valued function of a real vector argument 
X, 


u-.n (94) 

A necessary condition for u, to be a stationary point of F{u) 
under the m constraints 


/ Li(x, it(x)) dx = 0, * 
JQ 


1,2, ...,m. 


(95) 


is that the following simplihed Euler-Lagrange equation is 
satished 


dK 

du 


™ nr 


i=l 


du 


(96) 


The Lagrange multipliers should be chosen to fulhll the 
constraints. Note that, K and Li, i = l,2,...,m, are real 
valued functions with continuous first partial derivatives. 
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B. Functional Optimization of the Lower Bound 

We perform the supremum by employing the functional 
optimization method reviewed in Appendix IV-A. Based on 
(64), the functional that must be maximized is written as 


^(/(f)) = ^ - ;^/(f)log(/(f)) 


- /(f)2 27re gr{v) 

1 

+ log2 - - log2 27re(T- 


df. 


(97) 


Using (97), the function corresponding to K in (93) is 
identified as 

if (?,/(?)) = -^/(r)log2(/(r)(gr(f))^/^) 

(98) 


There are two constraints on /(r) that must be satisfied. 
First, /(r) has to integrate to one. The second constraint can 
be formulated based on the average power constraint of the 
input distribution. Consequently, for Li in (95), we obtain 

7-i(f,/(f)) =/(f) - 1 (99a) 

L2(f,/(r)) = ||r||V(f) - M{E, + al). (99b) 


Finally, by substituting (98) and (99) into the Euler-Lagrange 
equation (96), we obtain the output distribution that maximizes 
(64) as 

/(r) = (iqO) 

As mentioned before, the optimized /(r) is used as /(R) 
to evaluate the lower bound for the capacity of the channel 
in (1). 
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